Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge
نویسندگان
چکیده
Recent improvement in text-to-speech (TTS) and voice conversion (VC) techniques presents a threat to automatic speaker verification (ASV) systems. An attacker can use the TTS or VC systems to impersonate a target speaker’s voice. To overcome such a challenge, we study the detection of such synthetic speech (called spoofing speech) in this paper. We propose to use high dimensional magnitude and phase based features and long term temporal information for the task. In total, 2 types of magnitude based features and 5 types of phase based features are used. For each feature type, we build a component system using a multilayer perceptron to predict the posterior probabilities of the input features extracted from spoofing speech. The probabilities of all component systems are averaged to produce the score for final decision. When tested on the ASVspoof 2015 benchmarking task, an equal error rate (EER) of 0.29% is obtained for known spoofing types, which demonstrates the highly effectiveness of the 7 features used. For unknown spoofing types, the EER is much higher at 5.23%, suggesting that future research should be focused on improving the generalization of the techniques.
منابع مشابه
Audio Replay Attack Detection with Deep Learning Frameworks
Nowadays spoofing detection is one of the priority research areas in the field of automatic speaker verification. The success of Automatic Speaker Verification Spoofing and Countermeasures (ASVspoof) Challenge 2015 confirmed the impressive perspective in detection of unforeseen spoofing trials based on speech synthesis and voice conversion techniques. However, there is a small number of researc...
متن کاملRelative phase information for detecting human speech and spoofed speech
The detection of human and spoofed (synthetic/converted) speech has started to receive more attention. In this study, relative phase information extracted from a Fourier spectrum is used to detect human and spoofed speech. Because original/natural phase information is almost entirely lost in spoofed speech using current synthesis/conversion techniques, a modified group delay based feature, the ...
متن کاملAnti-spoofing Methods for Automatic Speaker Verification System
Growing interest in automatic speaker verification (ASV) systems has lead to significant quality improvement of spoofing attacks on them. Many research works confirm that despite the low equal error rate (EER) ASV systems are still vulnerable to spoofing attacks. In this work we overview different acoustic feature spaces and classifiers to determine reliable and robust countermeasures against s...
متن کاملSpoofing detection with DNN and one-class SVM for the ASVspoof 2015 challenge
Speaker verification systems have achieved great performance in recent times. However, we usually measure performance on a ideal scenarios with naive impostors that do not modify their voices to impersonate the target speakers. The fact of impersonating a legitimate user is known as spoofing attack. Recent works show the vulnerability of current speaker verification technology to several types ...
متن کاملAnti-spoofing system: an investigation of measures to detect synthetic and human speech
Automatic Speaker Verification (ASV) systems are prone to spoofing attacks of various kinds. In this study, we explore the effects of different features and spoofing algorithms on a state-of-the-art i-vector speaker verification system. Our study is based on the standard dataset and evaluation protocols released as part of the ASVspoof 2015 challenge. We compare how different features perform w...
متن کامل